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\Q ' Abstract 
(N ' 

RNA folding is a kinetic process governed by the competition of a large 
number of structures stabilized by the transient formation of base pairs that 
may induce complex folding pathways and the formation of misfolded struc- 
tures. Despite of its importance in modern biophysics, the current under- 
standing of RNA folding kinetics is limited by the complex interplay between 
the weak base-pair interactions that stabilize the native structure and the 
Q ■ disordering effect of thermal forces. The possibility of mechanically pulling 

individual molecules offers a new perspective to understand the folding of 
nucleic acids. Here we investigate the folding and misfolding mechanism in 
RNA secondary structures pulled by mechanical forces. We introduce a model 
based on the identification of the minimal set of structures that reproduce the 
patterns of force-extension curves obtained in single molecule experiments. 
J> ' The model requires only two fitting parameters: the attempt frequency at the 

■ level of individual base pairs and a parameter associated to a free energy cor- 

rection that accounts for the configurational entropy of an exponentially large 
number of neglected secondary structures. We apply the model to interpret 
results recently obtained in pulling experiments in the three-helix junction S15 
RNA molecule (RNAS15). We show that RNAS15 undergoes force-induced 
misfolding where force favors the formation of a stable non-native hairpin. 
The model reproduces the pattern of unfolding and refolding force-extension 
curves, the distribution of breakage forces and the misfolding probability ob- 
r> ' tained in the experiments. 
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1 Introduction 



Like proteins, RNAs have enzymatic, regulatory and structural functions that are crucial 
for the correct operation of cells [TJ [2]. RNA molecules are found in single stranded 
form and are designed to fold into specific three-dimensional conformations, called native 
states. RNA folding is a kinetic process mainly governed by the interactions between 
complementary bases which can lead to the formation of both native and non-native 
domains. As a result, folding into states that are structurally different from the native 
state, usually referred as misfolding, can occur [3]. Misfolded RNAs are not functional and 
can be harmful to organisms [4], just as misfolded proteins (e.g. prions) that are involved 
in several diseases [5]. Folding of biomolecules, such as RNA molecules and proteins, is 
therefore a subject of great importance in modern biophysics. Under which conditions 
misfolding is prone to occur? What are the structural elements that prevent folding into 
the native structure? Is it possible to control misfolding by designing specific molecular 
sequences?. To answer such questions modeling of biomolecular folding is of great help. 
The competition between a very large number of structures, that may lead to misfolding, 
makes modeling of folding a difficult and challenging problem in biological physics where 
disorder and frustration play a crucial role [61 [7]. RNA mostly folds in a hierarchical 
fashion dominated by the formation of secondary structures [U EJ \W\ [T5] . In contrast to 
proteins where native state prediction is very difficult, it is possible to infer the correct 
secondary structure of RNA molecules from computer calculations (Mfold). This makes 
RNA folding a more tractable theoretical problem than protein folding. Bi-stability and 
misfolding in nucleic acids have been recently investigated in temperature ramping [11] 
and force pulling [12] experiments. 

In this work we address the problem of folding/misfolding in RNA molecules that are 
stretched by mechanical forces. Using single molecule techniques it is nowadays possible to 
pull on individual molecules such as biopolymers (e.g. nucleic acids, proteins, sugars...), 
molecular complexes (e.g. motor proteins and DNA/protein fibers) or even to stretch 
cells. Single molecule techniques provide valuable information about the thermodynamics 
and kinetics of biomolecular processes, thereby enlarging our knowledge of fundamental 
processes at the molecular and cellular level [13]. Among the most successful techniques 
in the field are optical tweezers, AFM and magnetic tweezers, all them capable of exerting 
forces in the piconewton (pN) range (lpN=10 -12 N). Various studies have investigated 
the unfolding/refolding of individual RNA molecules using optical tweezers. RNA hair- 
pins are typically unzipped at forces around 15pN where base pairs are disrupted by the 
direct action of force. Folding kinetics in force is of current interest as it provides an 
alternative route to investigate the problem of molecular folding, complementary to stud- 
ies of folding by varying temperature or denaturant concentration. What is the main 
effect of force in RNA folding? Under the action of mechanical forces, the formation of 
secondary contacts in RNA between bases located at distant segments of the molecule is 
hampered by the stretching effect of the force. Starting from a stretched state and by 
progressively decreasing the force, folding is partially a sequential process in contrast to 
the non-sequential mechanism observed in thermal folding [16] . Here we introduce a phe- 
nomenological model, based on a sequential dynamics at the level of individual base pairs, 
that is useful to investigate folding and misfolding of RNA molecules that lack tertiary 
contacts. We apply it to interpret and reproduce experimental results recently obtained 
in the three-helix junction S15 RNA molecule, hereafter referred as RNAS15, pulled by 
optical tweezers p2] (see Fig. [I]). These experiments consist of repeated force cycles that 
start from the fully stretched molecule at high forces. The force is first decreased down 
to low values to let the molecule refold. Next, it is increased up to the initial value in 
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order to unfold the molecule again [T8] . In this way the folding reaction can be monitored 
as a function of time. In such experimental conditions, we show that RNAS15 under- 
goes force-induced misfolding behavior as a consequence of the competition between the 
formation of two hairpins that cannot coexist in the same conformation. The computed 
misfolding probability, defined as the probability to end up in the misfolded state at the 
end of the relaxing process, is in good agreement with that obtained in the experiments. 
We are also able to reproduce the experimental unfolding and refolding force-extension 
trajectories, and obtain distributions of breakage forces (i.e. the force at which the native 
structure unfolds) that match the experimental ones at different loading rates. 

2 Two unfolding patterns in RNAS15 

The present work is based on previous pulling experiments [T7] where optical tweezers |19| 
were used to induce unfolding and refolding in RNAS15 at room temperature (T = 298K) 
in a solvent free of magnesium ions to avoid the formation of tertiary contacts. In these 
experiments a molecular construct is synthesized where the molecule RNAS15 is inserted 
between molecular DNA/RNA hybrid handles that provide enough space between the 
two beads to avoid non-specific interactions between the molecule and the beads, see Fig. 
[TJ The force applied on the molecular construct (RNAS15 plus handles) is then ramped 
at constant speed [20] between 2 pN and 20 pN at two loading rates, r = ^pN.s" 1 
and r = 20PN.S" 1 . At 2 pN (20 pN), the thermodynamically stable state is the folded 
(stretched) state. The output of the experiments is the force-extension curve that gives 
the force applied to the molecule as a function of the molecular extension. During the 
unfolding part of the cycle (2 pN — > 20 pN), two types of unfolding curves, referred 
to as major and minor, are observed (see Fig. [I]). The major curves correspond to 
approximately 95% (90%) of the trajectories at r ~ 12pN.s -1 (~ 20PN.S" 1 ). The minor 
curves correspond to the rest ~ 5% (~ 10%). 

The major curves show a cooperative transition similar to that observed in the un- 
zipping of small RNA hairpins [17\ I18j. Up to forces ~ 15 pN, the force-extension curve 
corresponds to the stretching of the molecular handles used to manipulate the molecule 
\17\ I18j. The sudden large gain in the extension at forces around 15 pN is consistent with 
the whole opening of RNAS15 that is 77 bases long. On the other hand, the minor curves 
do not show the typical stretching behavior of the handles at low forces (/ < 14 pN). 
In particular, a non-cooperative transition occurs at force values between 6 and 9 pN. 
At these forces, the minor trajectories show large fluctuations in the extension (Fig. [1]) 
suggesting the presence of fast conformational events where the molecule partially unfolds 
and refolds. Moreover, the cooperative transition observed in the minor curves at forces 
around 14 pN corresponds to the opening of a ~ 30 bases domain that is much shorter 
than the total length of the RNA molecule. 

As shown in Fig. [21 the major unfolding curves are well reproduced by using an ex- 
tension of the sequential kinetic model introduced by Cocco et. al [211 E2J, applied to 
the native three-helix junction (denoted by N). The model in [21] describes the fold- 
ing/unfolding force kinetics of single hairpins at the level of individual base pairs. It has 
one free parameter which is the attempt frequency, k a , for the opening and closing rates 
of a single base pair (see Methods). We extend this model to include multi-branched 
structures such as N in RNAS15, which is composed of a stem S that branches into two 
hairpins HI and H2 (Fig. [2|). We also include the effect of the instrumental setup used 
in the optical tweezers experiments [23]. 

Our numerical results show that, during the unfolding transition, the whole structure 
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unfolds immediately after the stem opens. Accordingly, an analysis of the distribution 
of breakage forces predicts a transition state for the unfolding reaction that is located 
close to the native state (see Methods). The corresponding kinetic barrier is actually 
generated by the presence of successive strong GC base pairs in the stem. On the other 
hand, this sequential dynamics applied to the structure composed of the native hairpins 
Hi and H2 does not reproduce the minor curves (data not shown). This suggests that 
the minor curves correspond to the unfolding of a misfolded structure, rather than to 
the unfolding of a structure that is partially folded into N (with hairpins Hi and H2, 
but not the stem, formed). By using the Vienna package for predicting RNA structures 
[25] we have searched for the most stable structure without the stem formed (in order 
to avoid the large cooperative rip characteristic of the major curves). This structure, 
denoted as M, is composed of two hairpins, and H^ , and has a free energy of 6.3 
kcal/mol (~ 10.5/csT) above that of the native structure (see Methods and Fig. [3h) -note 
that N and M cannot coexist at the same time since the same nucleotides are involved 
in different base pairings. Upon stretching M, numerical simulations show minor-like 
unfolding curves similar to the experimental ones (see Fig. [3)3). In the simulations, the 
cooperative transition observed around 14 pN corresponds to the unfolding of the ~ 30 
bases hairpin H^ as shown in Fig. [3b- This figure also shows that for loading rates 
similar to those of the experiments, JTj unfolds in a non-cooperative way at force values 
between 6 and 9 pN (see Appendix [A] for a discussion on this issue). This corresponds 
to the non-cooperative transition observed in the experimental (unfolding) minor curves 
(see above and Fig. [Vj. In the following, we provide quantitative evidence showing that 
the minor curves indeed result from the formation of M. 

3 The minimal structures model 

In order to investigate the folding/misfolding in RNAS15 we introduce a model that can 
be applied to any nucleic acid secondary structures. We call it the minimal structures 
model (MSM). The essential idea behind the model consists in associating to each type of 
experimental unfolding curve -two in the case of RNAS15, "major" and "minor"- a unique 
stable structure, whose unfolding force-extension pattern, obtained using the sequential 
dynamics, reproduces the experimental one. From this set of stable structures, that we call 
minimal structures, we generate the ensemble of configurations used to investigate both 
the unfolding and the refolding of the molecule. These configurations, hereafter referred 
to as MSM configurations, are built as follows. First, we consider all the intermediate 
configurations resulting from the sequential unfolding of each minimal structure. Each of 
these intermediate configurations is composed of hairpins that are separated by regions 
of unpaired bases. The ensemble of MSM configurations results from all the possible 
combinations of these hairpins (Fig. [4|). The initial set of locally stable structures is said 
to be minimal since each of these structures is necessary to reproduce one of the pattern of 
unfolding force-extension curves obtained in the experiments. Moreover, this minimal set 
of structures makes simulations of kinetics affordable form a computational point of view 
(the number of configurations in the MSM grows in a polynomial way as Yli=i #MS Ni, 
Ni being the total number of base pairs of the minimal structre i and #MS the total 
number of minimal structures). Although the inclusion of more structures might appear 
desiderable, the implementation of the kinetics soon becomes exceedingly complicated 
and little is actually gained regarding comparison with the experiments. Finally, the 
dynamics that we implement at the level of single base pairs [21] satisfies detailed balance 
and is ergodic (i.e. each configuration in the MSM is connected through a path, made 
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out of a finite number of successive openings and closings of base pairs, to any other 
configuration). Detailed balance and ergodicity are essential properties of the dynamics 
ensuring that, in the equilibrium state, all configurations are accessible and sampled 
according to the Boltzmann-Gibbs distribution. Detailed balance and ergodicity make 
the link between dynamics and thermodynamics where time averages can be replaced by 
ensemble averages. 

During refolding there is always competition in the formation of hairpins that have 
bases in common (e.g Hi and in RNAS15). Therefore, with more than one minimal 
structure, the MSM naturally leads to the formation of the different minimal structures 
and hence to misfolding. In RNAS15, comparison between experiments and numerical 
simulations for the unfolding curves (Fig. [2] and [3|) suggests to choose N and M as the 
minimal structures. The total number of configurations within the MSM being on the 
order of a few hundreds. We have carried out numerical simulations of force cycles in 
the MSM in RNAS15 and observed the presence of minor and major unfolding curves in 
agreement with the experiments. Yet, the current model is not good enough to reproduce 
the experimental results as we are still not able to simultaneously reproduce the unfolding 
and refolding curves in a quantitative way (data not shown). In particular, by choosing a 
value of the attempt frequency k a that fits well the unfolding curves, we obtain refolding 
curves that do not match the experimental results (typical refolding forces are 2 pN higher 
in simulations than in experiments). Different causes could explain this discrepancy. First, 
we have neglected a large number of configurations that might compete with those of the 
MSM and whose presence would lead to lower refolding forces in agreement with the 
experimental results. In addition, the transient formation of tertiary interactions such as 
pseudo-knots, could be relevant during the folding process. 

The number of secondary structures that can be formed in RNA grows exponentially 
with the total number of bases. Therefore, it is impossible, in large molecules, to simulate 
kinetics in the full ensemble of secondary structures. Although it is possible to determine 
the free energy of all possible secondary structures it appears extremely difficult to im- 
plement kinetic rules between all possible configurations. The simplest strategy, in order 
to include the effect of additional structures on the dynamics, is to consider all possible 
secondary contacts that can be formed within the unpaired regions in a given MSM config- 
uration. Because the explicit inclusion of all possible secondary structures in the dynamics 
is too difficult, we take advantage of approximative schemes to address such problem. The 
current problem is reminiscent of that encountered in liquid or statistical field theories 
where an infinite class of correlation functions or observables have to be simultaneously 
solved. It is then common to solve the dynamics by closing the hierarchies of observables 
by selecting only a specific subset among all possible classes and resumming all diagrams 
among that subset. Here we adopt such strategy. In the spirit of resummation techniques 
in statistical physics, we integrate out all these additional structures and add corrections 
to the free energies of the MSM configurations as explained below. 

3.1 Estimate of the free-energy correction in the MSM. 

Let us consider a generic configuration C of the MSM with free energy G(C, f) at a given 
force /. C is by definition composed of hairpins and regions of unpaired bases (Fig. [5]). 
Starting from this configuration, we can generate additional ones by allowing the formation 
of secondary contacts between complementary bases within each unpaired region. The 
inclusion of these additional configurations in the MSM would result in a larger ensemble 
of configurations. This would also modify the thermodynamics of the system. Hence, 
in order to keep an ensemble of configurations as small as possible, the effect of such 
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additional configurations is taken into account by adding a free energy correction, G C (C, /) 
to each configuration C. Subsequently, the free energy of any configuration C in the MSM 
can be split into three contributions: 

G(C, f) = G (C) + G m (C, f) + G C (C, /). (1) 

Go(C) is the free energy of formation of the configuration C at zero force. G m (C, /) 
stands for the contribution to the mechanical free energy due to the stretching of the 
unpaired regions that are exposed to the force. This is equal to Jq xc(f')df where xc{f) 
is the equilibrium average extension of the configuration C at force /. Finally, the free 
energy correction at force /, G c (C,f), is added so that G(C,f) includes C and all the 
possible secondary structures that can be formed from C using the bases of the unpaired 
regions. Note that some of these structures may correspond to configurations originally 
belonging to the MSM and, therefore, should not be included in the calculation of G C (C, /). 
In fact, the inclusion of such structures would lead to an incorrect and strongly biased 
estimation of the free energy correction inherent to the large thermodynamic stability of 
all configurations that belong to the MSM. The proper estimation of G C (C, f) is therefore 
a very difficult task and a different strategy is required to circumvent this problem as we 
shall explain in the following. 

In the present treatment, for the sake of simplicity, we do not consider interactions be- 
tween bases of different unpaired regions. As a consequence, G C (C, /) can be decomposed 
as a sum of independent contributions g l c coming from each unpaired region i. Having 
proceeded so far, we try to get an estimation of the correction G c (C,f) that can be ef- 
ficiently implemented in the numerical simulations of the kinetics. We use an annealed 
approximation where the contribution from each region i only depends on the number n, 
of bases of that region, g l c = g c {ni, /). As a result, we get G C (C, f) = J2i=i 9c{n-i, f) where 
Njj is the total number of unpaired regions (see Fig. [5]). 

As the free energy of an RNA sequence depends much on its sequence, g c (n, f) should 
be estimated for each primary sequence. In this regard, our estimation procedure consists, 
first, in evaluating the average free energy of an n-base long polynucleotide chain that 
is chosen within that sequence (see Methods). The average is taken over all possible 
segments of length n along that sequence. To this value we subtract the initial stretching 
free energy G m (n, f) of the n-bases long polynucleotide and obtain F(n,f). F(n,f) is 
always a lower bound to g c (n, f) as it includes the contribution coming from the additional 
new configurations but also the contribution from configurations already generated by the 
minimal structures. In fact, by averaging over all segments covering the whole sequence, 
the term F(n, f) gets contributions from all possible hairpins that can be formed with n 
bases. Therefore F(n, /) is biased toward low values due to the stabilizing contribution 
to the free energy by the minimal structures (e.g. the native or the misfolded structures 
in the case of RNAS15). This bias is particularly strong at low forces where the native 
hairpins dominate the annealed average. How does F depend on n and /? The fact that 
the free energy F is an extensive variable (i.e. depends linearly on the size of the system n, 
at least for n > 5 where loop formation is possible) implies that the first derivative dF/df 
(i.e. respect to the intensive variable /) also depends linearly on n. These properties are 
well confirmed by using the Vienna package [25J, which gives the exact partition function 
and the equilibrium free energy for any RNA sequence. In the case of RNAS15 we find 
F(n, f) ~ at (n — 5) where the parameter aj depends linearly on / up to a certain force 
value f c ~ 12 pN for which it vanishes: a,f — a(f — f c )/ f c if / < f c and aj = Off 
/ > fci with a ~ 0.5kcal/mol = 0.9/cgT (see figured]). We stress that, for arbitrarily long 
sequences, determining a and f c is still possible by restricting the calculation of the free 
energy F(n, f) to small values of n (e.g. up to n ~ 50) where aj is a linear function of / 
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(Fig. ED- 

How to proceed now in order to estimate the true correction g c (n, /)? The functional 
form obtained for F(n,f) suggests the same functional dependence for g c (n,f), albeit 
with a priori different parameters, a and f c . f c in F(n, f) is the force value where the 
free energy correction vanishes and below which secondary structures become, in average, 
more stable than the fully unfolded or unpaired form. At forces around f c ~ 12 pN many 
other configurations can be as stable as the MSM configurations. Therefore, the value 
of f c is not expected to be very sensitive to the bias introduced in the annealed average 
by the inclusion of the MSM configurations. Thus, we keep f c ~ 12 pN for g c (n,f) also. 
Consequently, the free energy correction term leads to only one additional free parameter 
in the model, that we call A . The free energy correction finally reads g c (n, f) ~ Af(n — 5) 
with Af ~ A(f — f c )/ fc if / < fc an d zero otherwise. The parameter A corresponds to 
the free energy correction per base pair at zero force and satisfies A < a because F(n, f) 
is a lower bound to g c (n,f). What is the main effect of A on the kinetics of unfolding 
and folding?. Additional configurations naturally tend to slow down the formation of 
individual hairpins that belong to the minimal structures. Accordingly, the free energy 
correction modifies the closing rates rather than the opening rates of individual base pairs 
(see Methods). Therefore the value of the parameter A mostly determines the kinetics of 
folding rather than unfolding and a larger value of A tends to slow down the kinetics of 
folding. 

3.2 Applying the model to RNAS15. 

Overall the model requires only two free parameters, k a and A, in order to fit all the 
experimental data available in RNAS15. The parameters A = 0.3/ceT and k a = 10 7 s -1 
lead, at both loading rates, to unfolding and refolding force-extension curves, distributions 
of breakage force and misfolding probabilities that are in quantitative agreement with 
those found in the experiments (Fig. [6] and [7]). Since no further explicit structures are 
necessary to reproduce the experimental data, we conclude that, in this case, a model 
containing the minimal structures N and M plus the free energy correction term, is 
enough to explain both the unfolding and refolding kinetics of RNAS15. In this regard, 
we have extended our analysis by including other minimal structures different from N 
and M and have obtained very similar results (data not shown). 

Regarding the force-extension curves we note that the shoulder observed during the 
refolding trajectory (Fig. [6^) is mainly due to the transient formation of hairpins (H±, 
H 2 , H^ 1 and H$*). On the other hand, the minor curves correspond to the unfolding 
of the misfolded structure M where the hairpin does not allow the formation of the 
native hairpin Hy. M acts as a kinetic trap that impedes the formation of N . Misfolding 
in RNAS15 is not induced by thermal fluctuations since the free energy difference between 
N and M is very large, AAGo ~ 10.5/c#T. Rather it is induced by the force that tends 
to favor the misfolding pathway. 

Finally, we note that the free energy correction per base pair, A ~ 0.3ksT, is an order 
of magnitude smaller than the typical free energy of formation of individual base pairs 
(~ 3ksT). Yet, it is necessary to include this correction (about 10%) to quantitatively 
reproduce the experimental features of the unfolding/refolding kinetics in RNAS15. 
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4 Misfolding probability 



In a force cycle protocol, misfolding can be quantified by the misfolding probability Pm- 
This is given by the probability to end up in the misfolded state at the end of the relaxing 
process. Multi-state models of chemical reactions provide a general picture about the 
unloading rate dependence of this probability. The simplest model consists of a three- 
state system (native N, misfolded M and stretched S) where the misfolded state M 
acts as a kinetic trap during the folding transition (Fig. [7h). Starting from S at high 
forces, and by decreasing the force at a constant rate r, the general question we ask is 
how P/v/(r) depends on r. In the general situation of a force-independent position of the 
kinetic barriers Bn,Bm (located at distances djVj^Af from S), we find that Pm{t) has 
a unique maximum located at r* (see Appendix [B]). However, if and c?m depend on 
the force, PM( r ) shows a more complex behaviour where several maxima can appear (see 
Appendix [Bj. This general scenario is expected to be applicable in RNAS15 where the 
results obtained from simulations of the MSM show a Pm(t) with two maxima (Fig. [7b). 
From a general point of view, a Pj\/(r) with more than one maximum suggests a complex 
free energy landscape with force dependent transition states (leading to force dependent 
fragilities as in the case of RNA hairpins |26|). 

5 Discussion and conclusions 

In this work we have investigated the folding/unfolding behaviour of nucleic acid sec- 
ondary structures that are pulled by mechanical forces. To this aim we have introduced a 
phenomenological model (MSM) that is based on: the sequential dynamics of a minimal 
number of structures; and the inclusion of corrections in the free energy that account for 
the configurational entropy contributed by the exponentially large number of neglected 
secondary structures. The model describes force-induced misfolding of nucleic acid sec- 
ondary structures such as RNA and DNA. It can be applied to arbitrary nucleic acid 
sequences that can form different secondary structure and can be used to predict the 
phenomenology observed in dynamic force spectroscopy measurements (breakage force 
distributions, force-extension curves and misfolding probability). The applicability of the 
approach has been shown in the case of the RNA three- helix junction S15. 

The model can be also used in the prediction of different folding kinetics scenarios 
by implementing different sets of minimal structures. Sometimes the full applicability 
of the model may require the previous experimental identification of the minimal set of 
structures that generate the different patterns of force-extension curves. Although the 
model cannot predict misfolding for a given sequence it can be applied to identify possible 
misfolded states as well as kinetic intermediates by doing systematic in silico experiments. 
A useful strategy could be using the Vienna package [25] to build up the minimal set of 
structures and consequently determine potential misfolded states by generating different 
sets of secondary structures for the given RNA sequence. Subsequently one should search 
for the most stable structures that can be formed when native domains are not allowed. 
However, we are not able yet to provide a receipt that leads to the systematic determina- 
tion of these states. As a consequence, the method we used for the determination of the 
misfolded structure must be specifically adapted to every RNA sequence. 

For a given nucleic acid sequence the model only has two fitting parameters, k a and 
A. The first one, k a , is an attempt frequency at the level of individual base pairs which 
should not vary much with the specific sequence under study. In this regard, the value 
we report for k a in RNAS15 is in agreement with the values obtained for other RNA 
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molecules |2H [25] as expected. The second parameter, A, is a thermodynamic parameter 
related to the configurational space of the molecule, i.e. the space of secondary structures 
associated with a given nucleic acid sequence. In principle, for a given RNA, the larger 
the ensemble of MSM configurations, the smaller the correction, and hence the value of A. 
However, the total number of configurations included in the free energy correction grows 
exponentially with the total number of base pairs of the molecule, whereas the number 
of configurations in the MSM grows as a power of that total number. Consequently, the 
inclusion of more minimal structures in the model should not change much the value of A. 
In addition, A is the free energy correction per base pair and, therefore, it should not be 
much sensitive to the specific molecular sequence. Therefore it is reasonable to expect that 
the reported value of A ~ 0.3/csT is largely constant among all RNA sequences under 
identical environmental conditions (e.g. temperature and salt). What happens in the 
case of short canonical (i.e. fully complementary or Watson-Crick base-paired) hairpins? 
These molecules show two-state behavior and cooperative folding [211123] . yet the entropic 
correction might still be necessary to fully describe the kinetics of folding. In this case, 
there will be just one minimal structure (the native one) so the effect of the entropic 
correction, albeit small, could be experimentally observable. It would be very interesting 
to carry out future experiments capable of identifying, in generic two-state molecules, this 
correction of entropic origin. Finally, let us mention that a different theoretical approach 
is required to model the thermal denaturation of RNAs and the associated folding and 
misfolding mechanisms. In this case, the dissociation of base pairs is not a sequential 
process anymore. 

Recent pulling experiments in TAR RNA [12] have shown how stretching forces can 
help the formation of the native structure when the molecule is initially trapped in mis- 
folded structures. Here, we have found that a mechanical force can also induce the opposite 
effect, by favouring misfolding pathways that are unlikely in the absence of force. It re- 
mains a challenge to apply this model to predict the detection of misfolded structures 
and kinetic intermediates in single molecule pulling experiments for specifically designed 
nucleic acid sequences. 

6 Methods 

Optical tweezers experimental setup. 

Experiments in RNAS15 were reported in a previous paper by Collin et. al [T7]. Buffer 
conditions were 100 mM Tris-HCl, pH 8.1, 1 mM EDTA, free of magnesium ions, at room 
temperature T = 298K. RNAS 15 is attached, via RNA/DNA handles (~ 160 nm), to two 
micron-sized polystyrene beads. One bead is held fixed at the tip of a micropipette. The 
force is measured through the detection of the light deflected by the bead in the optical 
trap (Fig. Q}. 

Transition state along the unfolding pathway. 

From the breakage force data, one can obtain information about the transition state 
corresponding to the force-induced unfolding pathway using a two-state model. According 
to this model, the variance cr/ of the breakage force distribution is inversely proportional 
to the distance x F from the transition state to the folded native state, that is at = 
- B pr-. In RNAS 15, this relation leads to a transition state for the unfolding reaction that 
corresponds to a configuration where only the first two or three base pairs of the stem are 
opened. 
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Extended sequential dynamics. 

In the sequential model of Cocco et. al [21], successive closing and opening of base pairs is 
restricted to take place at the base of the hairpin, defined as the first 5'-3' base pair formed 
(Fig. U]). The corresponding opening rates (k ) depend on the free energy of formation 
of the base pairs, AGo: k Q = k a exp(— AGo/fesT) where k a is an attempt frequency. The 
closing rates (k c ) depend on the mechanical energy loss, AG m , due to the shortening 
of the unpaired part of the molecule: k c = fc a exp(— AGm/fceT). These free energies 
have been estimated by thermal denaturation experiments [27] and single molecule force 
experiments respectively [28\ 121?] . The attempt frequency k a is therefore the only free 
parameter of the model. Typical values measured by NMR fall in the range 10 7 — 10 8 Hz 
|24j . The extension of the model to multiple hairpins is depicted in Fig. SJ 

In our simulations, we allow for the formation of both Watson-Crick and non-canonical 
(GA and GU) base pairs. The values for the free energies of formation of the different 
base pairs have been obtained from the Vienna package (corresponding to 1 M NaCl 
|25j) by adding a uniform correction in order to meet the salt condition of the buffer 
used in the experiments (100 mM Tris-HCl). The salt correction is determined by im- 
posing the value for free energy of formation in RNAS15 to be equal to that recovered 
in the experiments [T7J. The algorithm involves the whole experimental setup (handles 
and beads) within the so-called mixed ensemble where the control parameter is the dis- 
tance between the optical trap and the immobilized bead [23j (rather than the force). 
Therefore, we include in AG m the contribution of both the handles and unpaired RNA. 
The latter and the regions of unpaired RNA bases are described by using a worm-like 
chain model [30\ [3Tj with persistence lengths of 10 nm (handles) and 1 nm (RNA) and 
contour lengths of 0.26 nm/bp (handles) and 0.59 nm/base (RNA). These values fit rea- 
sonably well the experimental force-extension curves in the region where the handles are 
strecthed. Each hairpin contributes to the total extension with an additional extension 
of ~ 2 nm. Finally, when taking into account our phenomenological corrections, k c be- 
comes k c = k a exp(—(AG m + AG^/fe^T) where AG C is the difference in the free energy 
corrections between the open and closed configurations. 

Free energy of an n-bases long segment of RNAS15. 

Any secondary structure that is built up from an n-bases long polynucleotide can be seen 
as a succession of unpaired regions and partial secondary structures closed by a base- 
pair (for instance, in Fig. [5] the partial secondary structures are the hairpins). The free 
energy of such secondary structure can then be divided into the mechanical free energy 
corresponding to the stretching of both the unpaired regions and the base-pairs that close 
the partial secondary structures, plus the free energy formation of each partial secondary 
structure. In RNAS15, we estimate the latter using the Vienna package. Computing 
the free energy of all the secondary structures that can be formed with the n-bases long 
polynucleotide allows us to determine the partition function, and hence the free energy, 
of the n-bases long polynucleotide at force /. 

Misfolding probability in RNAS15. 

We describe the dynamics of the MSM using a set of master equations (see see Appendix 
ICl) . These equations describe the time evolution of the probability of the RNA to be in 
a specific MSM configuration. To get the misfolding probability we numerically integrate 
the set of equations. The force is decreased at a given unloading rate r, starting from the 
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stretched state at an initial force /j n = 20 pN. The misfolding probability is computed at 
the end of the relaxing process when the force vanishes, i.e. when t = 20/r. 
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A Appendix: Cooperative unfolding of hairpins 

Ef and Rf 

The misfolded structure M is composed of two hairpins and . Both hairpins have 
similar thermodynamic stabilities and they present several mismatches (internal loops and 
bulges). Why unfolds cooperatively whereas Hf 4 does not (see Fig. (Ht)? By using 
the Vienna package [25J for the free energies of formation of different base pairs we can 
compute the free energy of and as a function of the number of denaturated base 
pairs at the critical force where the folded and the unfolded hairpin are equally stable 
(i.e where both states have the same free eenergy). As shown in Fig. [8] the free energy 
landscape associated to H^ 1 (blue) presents a high kinetic barrier between the folded and 
the unfolded hairpin, whereas the free energy landscape associated to H±* (red) is roughly 
flat. This explains the difference in the cooper ativity observed between the two hairpins. 



B Appendix: Misfolding in a three-state model 

In this section, we analyse in detail the dynamics of a three-state model where a misfolded 
state (M) acts as a kinetic trap during the folding transition from the stretched state (S) 
to the native state (N). Let us consider the case of a pulling protocol where the mechanical 
force applied to the system decreases at a constant loading rate r. Starting from a high 
force value where the stretched state is the most stable one, we prove that the misfolding 
probability Pm{t) a t the end of the force releasing process shows a single maximum along 
the r-axis. 

We denote by -P/v(i), -Pm(^) and Ps(t) the probability to be at time t in the state N, 
M and S respectively. The relaxation process is governed by the following set of master 
equations: 



a dP M , / „ , / „ 

p M = = k J s ^ M P S ~ k J M ^ s P M 

p s = ^=k f N ^ s P N + k f M ^ s P M -(k f s ^ N + k f s ^ M )P s (2) 

where is the transition rate to go from state a to state b at a given force /. 

Note that this model does not allow for direct transition pathways connecting N and M. 
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Transitions between these states always pass through the stretched state S. S can then 
be viewed as an obligatory intermediate state of the reaction N ^ M (see Fig. [9]). 

Absorbing states 

In a first stage, we study the analytically tractable case where N and M are absorbing 
states, i.e. k^^s — and kM^S = 0- The set ([2]) of master equations becomes: 

Pn = k S ->NP,S 

Pm = ks-^AiPs 

Ps = -(ks^N + k s ^ M )Ps (3) 

In the presence of a mechanical force that is coupled to the molecular extension, the rates 
ks->N,ks->M can be written as k$^N = kjy exp(— /Jd^ /) and k$->M = exp(— /3d,Mf) 
respectively, where d/v (<^m) is the distance along the reaction cooordinate between S and 
the kinetic barrier separating the state S from the state N (M) (see Fig. [7]), fcjy an d kM 
are the rates at zero force respectively and (3 = (/c^T) -1 is the inverse of the thermal 
energy unit. Using these relations for the rates and considering a ramping protocol where 
the force decreases at a constant rate r (/ = — r), the set of equations can be written 
in terms of the force as follows: 

dP N = _hlL e -Pd N f Ps 
df r 

dPM _ J m f> -pd M f Pi 
df 

dPs 1„ 



df r 



{k N e- paNl + k M e- paMT )P S . (4) 



Starting from an initial stretched state at very large force (/ « oo, Ps = 1, Pn = Pm = 0), 
the solution to is given by: 



Psif) = exp 



k N e- pdN f k M e~ PdMr 



r(5dN r(3dM J 

1 roo / k Ne -/3d N g fc Me -/3<W 



1 f°° 

PN{f) = ~ dg k N ex.p 
r Jf 



r Jf \ rj3dN rj3dM 



If 00 ,, f R , fcjye-^g k M e-^ \ 

PmU) = - dg k M exp\-i3d M g ^ ^ (5) 

r Jf y rfjdN rpdu J 

Let us focus now on the misfolding probability Pm = Puif = 0). Starting from Eq. ([5]) 
and after some simple manipulations, Pm can be written as: 

Pm = PMir, \,x) = -z ds exp f J , (6) 

where A = , f = r ^ M and x = djv/^M are adimensional parameters. Interest- 
ingly, depending on the ratio x = d]y/dM, two behaviors can be distinguished for the 
dependence of Pm as a function of the adimensional rate f, i.e. of the rate r. In the 
following, we show that for x < 1, Pm has a single maximum along the f-axis, whereas 
for x > 1, Pm is a decreasing function of f. 

The first derivative of Pm with respect to f reads: 



fi \x [' j * ( s + \s x \ „ / 1 + A 

(1 — xJA / ds s exp — rexp 

Jo V r J 



(7) 
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This clearly shows that when x > 1, O^Pm is negative for all the (positive) values of f, 
i.e. Pm is a decreasing function of f. When x < 1, the analysis is a bit more complicated. 
Let us show that df?M = has at least one solution for r > 0. First, when f — > oo, from 
Eq. ([7|) it is clear that dfPM is negative. Second, the following inequality holds: 

Fix ( s + Xs x \ / HAWi , / 1 + A\ 
y s exp I z J > exp I - — I J ds s x ~ exp I - — J (8) 

so that f 3 dfPM, and hence O^Pm, is positive when f — > (see Eq. (J7|)). Since dfPM is a 
continuous function that is positive when f — ► and negative when r — > oo, we conclude 
that dfPhi = has at least one solution for f > 0. We could rigorously prove that this 
solution is unique. However, for the sake of lightness, we present here a proof based on 
physical arguments. First of all, at large f, Pm decreases when r increases simply because 
the system does not have enough time to escape from S when the loading rate becomes 
too large. On the other hand, a decreasing Pm when f — > reflects the fact that at very 
large forces, the probability to fold into N is much higher than the probability to fold into 
M, the probabilities being very low though. In this case, the more time spent at high 
force values, i.e. the lower f, the less probable to fold into M. 

Because Pm — ► when both f — ► and f — ► oo, Pm shows at least one maximum at 
intermediate values of r. Moreover, in the present case where the location of the kinetic 
barriers does not depend on the applied force, we find that there is a single maximum for 
Pm when x > 1. 

Non- absorbing states: the quasi-static regime 

In the more realistic case where the states are not absorbing, the dependence of Pm 
with respect to r has a different nature at low r. In this case fluctuations between M 
and N (passing through S) tend to populate N at low forces. Indeed, by definition, the 
native state N is supposed to be much more stable than the other states of the system 
at zero force, namely M and S. Consequently, at low r the system has enough time 
to populate the native state. Or in other words, Pm(t) tends to its equilibrium value 
~ exp(— AAGo/ksT) when r — > 0. In any case (for both x > 1 and x < 1), we hence 
expect that Pm — > exp(— /?AAGo) ~ when r — ► where AAGo is the free energy 
difference between M and N. 

To conclude, we can say that in a three-state system with force-independent location 
of the kinetic barriers, the misfolding probability Pm shows always a bell-shape as shown 
in Fig. [10j However, the presence of the maximum may have a different cause depending 
on the value of the ratio x = d^/dM, i-e. depending on the relative distances of the native 
and misfolded kinetic barriers to the stretched state. 

Force-dependent location of the kinetic barriers 

Numerical simulations in RNAS15 show a complex dependence of the misfolding prob- 
ability at the end of a force cycle with respect to the loading rate r (see Appendix O 
and Fig. [7|). This suggests that RNAS15 cannot be modeled as a three-state model with 
force-independent position of the kinetic barriers along the reaction coordinate. Interest- 
ingly, in the three-state model described above, still one can numerically study the effect 
of force-dependent positions of the kinetic barriers on the shape of Pm{t~)- Physically, 
a dependence of d^ and g?m on the force corresponds to structural changes in the cor- 
responding transition states [26]. In the case of absorbing states N and M, and for a 
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force protocol where the force is released at constant rate r, the probabilities to be in the 
different states N, M and S at a given force / read: 



P s (f) = exp ^~ J™ dg {k N e- pdN ^ 9 + jt M e^ dM(9)ff )) 

i poo r 1 C GO 

PnU) = -J dgk N exp\-/3d N (g)g + -j dh (k N e~^^ h + k M e~^^ h ) 

i poo r 1 poo 

PmU) = -J dgk M exp[-PdM(g)g + -J dh (k N e^ d ^ h + k M e-^^ h ) j9) 

By playing with the force dependence of dj^(f) and dM{f) we can obtain different shapes 
for the misfolding probability Pm(I = 0) that show several extrema along the r-axis. 
For instance, we can choose c?m(/) < ^tv(/) at low forces and djyj{f) > djsr(f) at high 
forces. We then obtain a misfolding probability curve as the one shown in Fig. [TTJ The 
maximum at r > corresponds to a typical maximum of the force independent case 
x = djv/^M < 1, whereas the minimum at lower r is due to a crossover from x < 1 to 
x > 1. Interestingly, by solving the master equations fjlQf) (see below) and by imposing 
the misfolded structure of RNAS15 to be an absorbing state, we obtain the same kind of 
dependence for the misfolding probability. This suggests that in RNAS15, d]\f(f) < dw(f) 
at low forces. This also suggests that in the non-absorbing case, the low r-regime observed 
in the numerical simulations of RNAS15 is the consequence of a quasi-static regime that 
tends to populate the native state. 

C Appendix: Misfolding probability in RNAS15 



In RNAS15, we can estimate the misfolding probability by using the minimal structures 
model (MSM, see main text). Within this scheme, each configuration in the MSM can be 
labeled by C, where i = 1....N, N being the total number of MSM configurations. If Pi(t) 
is the probability to be in the configuration Cj at time t, the dynamics within the MSM 
is governed by the following set of master equations: 

Pit) = -J2 kL 3 P(t) + £ k< J'jU) Vi € [1; N] (10) 

where (J) counts for all the MSM configurations Cj that are connected to Cj via the 
sequential dynamics described in the Methods (see main text). k(_^- and are the 
corresponding force-dependent closing/opening rates (see the Methods section). 

We numerically integrate this system by imposing a decreasing force at constant rate 
r with the following initial condition: the molecule is in the stretched state (Pi(t = 0) = 1 
if Cj = S and P{(t = 0) = otherwise) and the force / = 20 pN. The curves we obtain 
are in good agreement with the experimental results (see Figs. fT2|) . 

Numerically, we have checked that our results remain unchanged using a coarse-grained 
description at the level of a few base-pairs in order to get results faster (simulations tend 
to be very slow when the number of configurations starts to grow). In this case, we use 
the following two-state approximation. Let us suppose for instance that we coarse-grain 
the system of equations (fTUj) at the level of nb p base-pairs (typically n^p = 2,3). If k* oc 
are the effective opening and closing rates, then k* + k* = X s where X s is the smallest 
eigenvalue of the nyp x n& p evolution matrix. The detailed balance condition imposes the 
value of the ratio k*/k*, hence it determines the values of k* and k* . 



14 



References 



[I] Doudna, J. A., and T. R. Cech. 2002. The chemical repertoire of natural ribozymes, 
Nature 418:222-228. 

[2] Moore, P.B., T. Steitz. 2002. The involvement of RNA in ribosome function, Nature 
418:229-235. 

[3] Herschlag, D. 1995. RNA chaperones and the RNA folding problem, J. Biol. Chem. 
270:20871-20874. 

[4] Chen, X., and S. L. Wolin. 2004. The Ro 60 kDa autoantigen: insights into cellular 
function and role in autoimmunity, J. Mol. Med. 82:232-239. 

[5] Dobson, C. M. 2002. Protein-misfolding diseases: Getting out of shape, Nature 
418:729-730. 

[6] Bundschuh, R., and T. Hwa. 1999. RNA Secondary Structure Formation: A Solvable 
Model of Heteropolymer Folding, Phys. Rev. Lett. 83:1479-1482. 

[7] Onuchic, J., Z. Luthey-Schulten, and P. G. Wolynes. 1997. Theory of protein folding: 
The Energy Landscape Perspective, Ann. Rev. Phys. Chem. 48:545-600. 

[8] Brion, P., and E. Westhof. 1997. Hierarchy and dynamics of rna folding, Annu. Rev. 
Biophys. Biomol. Struct. 26:113-137. 

[9] Tinoco, I. Jr., and C. Bustamante. 1999. How RNA folds, J. Mol. Biol. 293:271-281. 

[10] Zarrinkar, P. P., and J. R. Williamson. 1996. The kinetic folding pathway of the 
Tetrahymena ribozyme reveals possible similarities between RNA and protein folding, 
Nature Struct. Biol. 3:432-438. 

[II] Viasnoff, V., A. Meller, and H. Isambert. 2006. DNA nanomechanical switches under 
folding kinetics control, Nano. Lett. 6:101-104. 

[12] Li, P. T. X., C. Bustamante, and I. Jr. Tinoco. 2007. Real-time control of the energy 
landscape by force directs the folding of RNA molecules, Proc. Nat. Acad. Sci. USA 
104:7039-7044. 

[13] Ritort, F. 2006. Single molecule experiments in biological physics: methods and 
applications. Journal of Physics ( Condensed Matter) 18:R531-R583. 

[14] Hyeon, C, and D. Thirumalai. 2005. Mechanical unfolding of RNA hairpins. Proc. 
Nat. Acad. Sci. USA 102:6789-6794. 

[15] M. Wu, and I. Jr. Tinoco. 1998. RNA folding causes secondary structure rearrange- 
ment, Proc. Nat. Acad. Sci. USA 95:11555-11560. 

[16] Onoa, B., D. Dumont, J. Liphardt, S. B. Smith, I. Jr. Tinoco, and C. Bustamante. 
2003. Identifying Kinetic Barriers to Mechanical Unfolding of the T. thermophila Ri- 
bozyme, Science 299:1892-1895. 

[17] Collin, D., F. Ritort, C. Jarzynski, S. B. Smith, I. Jr. Tinoco, and C. Bustamante. 
2005. Verification of the Crooks fluctuation theorem and recovery of RNA folding free 
energies, Nature 437:231-234. 

[18] Liphardt, J., B. Onoa, S. B. Smith, I. Jr. Tinoco, and C. Bustamante. 2001. Re- 
versible Unfolding of Single RNA Molecules by Mechanical Force, Science 292:733-737. 

[19] Smith, S. B., Y. Cui, and C. Bustamante. 2003. Optical- Trap Force Transducer 
that Operates by Direct Measurement of Light Momentum, Methods in Enzymology 
361:134-162. 



15 



[20] Experimentally, the control parameter is the distance between the center of the trap 
and the tip of the micropipette (see Methods) and the system is pulled at a constant 
pulling speed. However, at forces larger than 5 pN the loading rate is approximately 
constant and equal to the trap stiffness times the pulling speed, see Evans, E., and K. 
Ritchie. 1997. Dynamic strength of molecular adhesion bonds, Biophys. J. 72:1541- 
1555. 

[21] Cocco, S., R. Monasson, and J. Marko. 2003. Slow nucleic acid unzipping from 
sequence-defined barriers, Eur. Phys. J. E 10:153-161. 

[22] Cocco, S., R. Monasson, and J. Marko. 2001. Force and kinetic barriers to unzipping 
of the DNA double helix, Proc. Natl. Acad. Sci. USA 98:8608-8613. 

[23] Manosas, M., J.-D. Wen, P. T. X. Li, S. B. Smith, C. Bustamante, I. Jr. Tinoco, 
and F. Ritort. 2007. Force Unfolding Kinetics of RNA using Optical Tweezers. II. 
Modeling Experiments. Biophys. J. 92:3010-3021. 

[24] Gueron, M., and J. L. Leron. 1992. Base pair opening in double-stranded nucleic 
acids. Nucleic Acids and Molec. Biol. 6:1-22. 

[25] Hofacker, I. L. 2003. Vienna RNA secondary structure server, Nucleic Acids Re- 
search., 31:3429-3431. 

[26] Manosas, M., D. Collin, and F. Ritort. 2006. Force-dependent fragility in RNA hair- 
pins, Phys. Rev. Lett. 96:218301-04. 

[27] Tinoco, I. Jr. 1993. In The RNA World, R. F. Gesteland and J. F. Atkins editors, 
Cold Spring Harbor Laboratory Press, 603-607. 

[28] Baumann, C, S. B. Smith, V. Bloomfield, and C. Bustamante. 1997. Ionic effects on 
the elasticity of single DNA molecules, Proc. Nat. Acad. Sci. USA 94:6185-6190. 

[29] Maier, B., D. Bensimon, and V. Croquette. 2000. Replication by a single DNA poly- 
merase of a stretched single-stranded DNA, Proc. Nat. Acad. Sci. USA 97:12002-12007. 

[30] Flory, P.J. 1969. In Statistical mechanics of chain molecules, appendix G, Oxford 
University Press, NY. 

[31] Smith, S. B., Y. Cui, and C. Bustamante. 1996. Overstretching B-DNA: the elastic 
response of individual double-stranded and single-stranded DNA molecules, Science 
271:795-799. 

[32] Dudko, O. K., G. Hummer, and A. Szabo. 2006. Instrinsic rates and activation ener- 
gies from single-molecule pulling experiments, Phys. Rev. Lett. 96:108101. 



16 



List of figures 



Figure QJ Major and minor force-extension curves. 

(Color online) Leftmost panel: Optical tweezers experimental setup for single RNA manip- 
ulation (figure not to scale). Rightmost panel: Experimental major and minor unfolding 
curves obtained from RNAS15 pulling experiments with optical tweezers |1 7] . The re- 
ported extension corresponds to the end-to-end distance of the RNA molecule plus the 
DNA/RNA hybrid handles. 

Figure [2t Unfolding of the native structure. 

(Color online) Leftmost panel: The RNAS15 three-helix junction native structure com- 
posed of a stem S (green) that branches into two hairpin loops H\ (orange) and ^(purple). 
Free energy of formation of the native state [25]: AG = —34.3 kcal/mol = —57 ksT at 
room temperature (298K). Rightmost panel: Experimental major unfolding curves com- 
pared with numerical results obtained from the sequential unfolding of the native structure 
(see text for details about the simulation procedure). 

Figure [3t Unfolding of the misfolded structure. 

(Color online) (a): The most stable structure without stem (called, in this paper, the 
misfolded structure) is composed of two hairpins: Hf 4 (orange) and (red). Its free 
energy of formation is equal to AG 1 = —29 kcal/mol = —48.3 ksT. (b): Experimental 
minor unfolding curves compared with numerical results obtained from the sequential 
unfolding of the misfolded structure on the left, (c): Curves obtained from sequential 
simulations (see text) of the unfolding of the individual hairpins H± and that com- 
pose the misfolded structure. Continuous lines represent a low bandwith average of the 
force-extension data. 

Figure St The minimal structures model (MSM). 

(Color online) Upper panel: Schematic representation of the sequential model for multi- 
hairpin structures. The only allowed transitions are the opening and closing of the base 
pairs located at the base of the hairpins (shown as thick bonds) where the force is applied. 
Lower panel: How to build the ensemble of configurations of the MSM. The intermediate 
configurations resulting from the sequential unfolding of either N or M are composed of 
hairpins and regions of unpaired bases (shown in blue). Then, the final MSM ensemble 
results from the combination of all the different hairpins and unpaired regions. In the 
example shown here, two hairpins [A and B) are combined together to form a configuration 
where the two original hairpins are separated by a region of unpaired bases. 

Figure [5t Free energy corrections. 

(Color online) Upper panel (a): schematic representation of a generic configuration C 
of the MSM. It is composed of hairpins and regions of unpaired bases. The free energy 
correction of a given configuration C at force /, G c (C,f), is given by the sum of the 
independent free energy contributions coming from all different unpaired regions. Lower 
panel: function F(n,f), defined as the free energy of an n-bases polynucleotide chain 
minus the mechanical free energy of the fully extended chain averaged over all possible 
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segments of that length n along the RNAS15 sequence. We find that F(n,f) is approxi- 
mately linear with n, F(n, /) ~ at (n — 5). The coefficient aj as a function of the force is 
plotted in the inset of the figure. 

Figure [6t Dynamic force spectroscopy results. 

(Color online) Experimental results compared to numerical simulations in the MSM. The 
MSM parameters are: A = 0.3 fceT and k a = 10 7 s _1 . (a): Unfolding and refolding major 
curves at loading rate r ~ 20pN.s _1 . (b): Distribution of breakage forces, i.e. the force at 
which the molecule unfolds, obtained from the major unfolding curves at r ~ 20PN.S" 1 
(distributions have been obtained from 900 (2000) trajectories in the experiments (simu- 
lations)) and r ~ 12pN.s _1 (distributions have been obtained from 400 (2000) trajectories 
in the experiments (simulations)). 

Figure [7t Misfolding probability and three-state model. 

(Color online) Upper panel: Representation of the three-state model including the stretched, 
native and misfolded states. The misfolded state acts as a kinetic trap for the folding 
transition between the stretched state and the native state. Lower panel: Misfolding 
probability (computed at the end of the relaxing process) as a function of the unloading 
rate. The experimental points correspond to r = 20pN.s _1 and r = 12pN.s _1 . 

Figure [8t (Appendix A) 

(Color online) Free energy as a function of the number of opened base pairs for the two 
hairpins forming the M structure, Hf* (red) and (blue), at the critical force where 
both the folded and the unfolded hairpins are equally stable (critical force values are 
around 10 and 11 pN for Hf 4 and H^ 1 respectively). Results shown are obtained by using 
the Vienna package |25j . 

Figure [9t (Appendix B) 

Three-state model with three states N, M, S. S is an intermediate state on-pathway from 
the misfolded to the native state. The four possible rates for b are also shown. 

Figure 110b (Appendix B) 

(Color online) Misfolding probability Pyi as a function of the adimensional rate f for 
the three-state model with force-independent positions of the kinetic barriers and na- 
tive/misfolded absorbing states. The full curves have been obtained by numerically inte- 
grating Eq. ((6]) with A = \jx so that kjy = km- The dashed curves show the corresponding 
case where the native/misfolded states are non absorbing. In this case, we denote by (T N , 
respectively d) M , the distance from states N, M respectively, to the position along the reac- 
tion coordinate of the kinetic barrier separating these states from S. The curves have been 
obtained using k^^s = exp(— (3(AGq + fd' N )) and kM-*S = exp(— /3(AGi + fd/ M )) 
with kN = ku = P = d) N = <$ M = 1. AGo = 20 and AGi = 10 correspond to the free 
energy of formation of the native and misfolded states, respectively, at zero force. 
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Figure lilt (Appendix B) 



Misfolding probability Pm as a function of the rate r in the case of a force-dependent 
position of the barrier between the native and the stretched state in the three-state model 
with absorbing native/misfolded states. The curve has been obtained by taking = 
0.8g?a/ for / > 5.5 pN and = 1.25cZm for / < 5.5 pN. We have also used = kyi = 
(3=1. 

Figure US (Appendix C) 

(Color online) Misfolding probability obtained from the set of master equations (|10|) de- 
scribing the folding kinetics in the MSM. The dashed black lines correspond to the case 
where the misfolded state is absorbing. 
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